Reproducible Research in Science

a scrapbook

Friday 08 May 2015

What is this talk about?

Purpose of reproducibility in science

Sound familiar?

Irreproducible results

Humorous no longer

-retrhisto

-https://nsaunders.wordpress.com/2015/03/24/pubmed-retraction-reporting-update/

Examples of results that could not be duplicated

Note, the next two clippings are from newspapers, not scientific journals

image1

Shady dealings in the world of medicine and grantsmanship

image

http://www.forbes.com/sites/fayeflam/2015/01/22/investigator-offers-lessons-from-precision-medicines-cancer-scandal/

image

http://retractionwatch.com/2012/02/14/the-anil-potti-retraction-record-so-far/

Statisticians wearing white hats

image image

Example of erroneous finding; dangerously close to home

-image

A parable about trying to do the right thing

How statistical analysis really happens

(contrary to what we tell our students)

image

Hadley Wickham, Simply Statistics Unconference http://t.co/D931Og8mq3

How statistical analysis really happens

(contrary to what we tell our students)

image image

Hadley Wickham, Simply Statistics Unconference http://t.co/D931Og8mq3

http://www.quora.com/What-is-data-munging

I had to explain this concept to students

A Biostatistics paper about false discovery rates

image image image

Jager and Leek paper attracts attention

Jager and Leek used the tools of reproducible research

Weaponising reproducibility

Is reproducibility universally good?

Does anybody care about this?

Watch the entire workshop

http://sites.nationalacademies.org/DEPS/BMSA/DEPS_153236

What about on this side of the pond?

image

The Academy of Medical Sciences, jointly with the BBSRC, MRC and Wellcome Trust, held a symposium on 1-2 April 2015 to explore the challenges and opportunities for improving the reproducibility and reliability of biomedical research in the UK.

http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research/

What does this have to do with me?

Glimpse at reproducible research tools

Github

Recall this scenario

Markdown

Transforms into

Obtain an estimate of probability of occupancy per site \((\Psi)\), together with an approximate 95% confidence interval for this probability, assuming perfect detection of the species within each site.

library(RMark, quietly=TRUE)
data(Donovan.7)
N.total <- dim(Donovan.7)[1]
T.occ <- nchar(Donovan.7$ch[1])
n.occupied <- sum(Donovan.7$ch!="00000")
Psi.0 <- n.occupied/N.total
Psi.0.ci <- Psi.0 + c(-1.96,1.96)*sqrt(Psi.0*(1-Psi.0)/N.total) # assuming normality

Based upon 17 of the 20 sites occupied, the estimate of occupancy is 0.85 with a confidence interval (assuming normality) of (0.694, 1.006).


Without fitting a model, decide whether the maximum likelihood estimates of an occupancy model fitted to these data assuming constant detection probability p, satisfy these equations:

\[\hat{\Psi} = \frac{n}{N\hat{p_.}} \]

\[\frac{\hat{p}}{\hat{p_.}} = \frac{\delta_{..}}{nT}\]

Making irreproducibility a thing of the past

Editorial policy of journals

image

image

An increasing number of initiatives aim to encourage scientists to ensure that their software is replicable. Courses run by organizations such as the non-profit Software Carpentry Foundation teach the value of writing and sharing solid scientific code, as well as the principles of constructing it. Software packages such as iPython and knitr make it easier to document code creation transparently and in its research context.

http://www.nature.com/news/rule-rewrite-aims-to-clean-up-scientific-software-1.17323

How can statisticians address the reproducibility issue?

image

Fix “challenged” science when it is conducted

not when it is reported

One proposal coming from the National Academy workshop:

PNAS

Leek and Peng 2015: Proceedings of National Academy of Science http://www.pnas.org/cgi/doi/10.1073.pnas.1421412111

Summary